STATS 32: Introduction to R for Undergraduates

Kenneth Tay

Oct 2, 2018

Agenda for today

The big data explosion



What is R?

Ross Ihaka & Rob Gentleman

Why learn R?

Reason #1: R was specifically designed for statistics and data analysis.

Map of US obesity rates (Source: stackoverflow.com)
Demo of the Central Limit Theorem (Source: yihui.name)

Why learn R?

(Source: stack overflow)

Why learn R?

Reason #3a: It’s easy to get started with R.

Why learn R?

Reason #3b: RStudio

Why learn R?

Reason #3c: Community

R-bloggers

Blog aggregator of content contributed by bloggers who write about R

Stack Overflow

Q&A site for programmers

The challenge of learning R

df[df$mpg > 30,]
with(df, df[mpg > 30,])
subset(df, mpg > 30)
filter(df, mpg > 30)
df %>% filter(mpg > 30)

Course objectives

By the end of this course, students will be able to:

Tentative overview of the course

Class logistics

Class logistics

Assignments

What is a variable?

x <- 3
y <- "abc"
x <- y

Variable types

Confusion: 123 vs. “123”

How to differentiate between numeric variables and character variables which consist of digits?

Let’s try R!









Optional material

The R Journal

Bi-annual open-access journal: Features short to medium length articles covering topics of interest to R users and developers

R-exercises

Website with both tutorials and exercises

DataCamp

Website for learning data science, R included (some courses free, some not)